- Title
- Development and external validation of automated ICD-10 coding from discharge summaries using deep learning approaches
- Creator
- Ponthongmak, Wanchana; Thammasudjarit, Ratchainant; McKay, Gareth J.; Attia, John; Theera-Ampornpunt, Nawanan; Thakkinstian, Ammarin
- Relation
- Informatics in Medicine Unlocked Vol. 38, no. 101227
- Publisher Link
- http://dx.doi.org/10.1016/j.imu.2023.101227
- Publisher
- Elsevier
- Resource Type
- journal article
- Date
- 2023
- Description
- Objectives: To develop an automated international classification of diseases (ICD) coding tool using natural language processing (NLP) and discharge summary texts from Thailand. Materials and methods: The development phase included 15,329 discharge summaries from Ramathibodi Hospital from January 2015 to December 2020. The external validation phase included Medical Information Mart for Intensive Care III (MIMIC-III) data. Three algorithms were developed: naïve Bayes with term frequency-inverse document frequency (NB-TF-IDF), convolutional neural network with neural word embedding (CNN-NWE), and CNN with PubMedBERT (CNN-PubMedBERT). In addition, two state-of-the-art models were also considered; convolutional attention for multi-label classification (CAML) and pretrained language models for automatic ICD coding (PLM-ICD). Results: The CNN-PubMedBERT model provided average micro- and macro-area under precision-recall curve (AUPRC) of 0.6605 and 0.5538, which outperformed CNN-NWE (0.6528 and 0.5564), NB-TF-IDF (0.4441 and 0.3562), and CAML (0.6257 and 0.4964), with corresponding differences of (0.0077 and −0.0026), (0.2164 and 0.1976), and (0.0348 and 0.0574), respectively. However, CNN-PubMedBERT performed less well relative to PLM-ICD, with corresponding AUPRCs of 0.7202 and 0.5865. The CNN-PubMedBERT model was externally validated using two subsets of MIMIC-III; MIMIC-ICD-10, and MIMIC-ICD-9 datasets, which contained 40,923 and 31,196 discharge summaries. The average micro-AUPRCs were 0.3745, 0.6878, and 0.6699, corresponding to directly predictive MIMIC-ICD-10, MIMIC-ICD-10 fine-tuning, and MIMIC-ICD-9 fine-tuning approaches; the average macro-AUPRCs for the corresponding models were 0.2819, 0.4219 and 0.5377, respectively. Discussion: CNN-PubMedBERT performed second-best to PLM-ICD, with considerable variation observed between average micro- and macro-AUPRC, especially for external validation, generally indicating good overall prediction but limited predictive value for small sample sizes. External validation in a US cohort demonstrated a higher level of model prediction performance. Conclusion: Both PLM-ICD and CNN-PubMedBERT models may provide useful tools for automated ICD-10 coding. Nevertheless, further evaluation and validation within Thai and Asian healthcare systems may prove more informative for clinical application.
- Subject
- deep learning; natural language processing; international classification of diseases; patient discharge summaries; SDG 17; Sustainable Development Goals
- Identifier
- http://hdl.handle.net/1959.13/1486793
- Identifier
- uon:51955
- Identifier
- ISSN:2352-9148
- Language
- eng
- Reviewed
- Hits: 1225
- Visitors: 1219
- Downloads: 0
Thumbnail | File | Description | Size | Format |
---|